Simple reinforcement learning agents: Pareto beats Nash in an algorithmic game theory study

نویسندگان

  • Steven Orla Kimbrough
  • Ming Lu
چکیده

Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2! 2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiagent Q-Learning: Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums

Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In this paper, we present an extended version of Nash Q-Learning using the Stackelberg equilibrium to...

متن کامل

Learning Optimal Seller Strategies with Intelligent Agents: Application of Evolutionary and Reinforcement Learning

The role of automated agents in the electronic marketplace has been growing steadily and has been attracting a lot of research from the artificial intelligence community as well as from economists. We consider the problem of homogeneous sellers of a single raw material or component vying for business from a single large buyer, and present artificial agents that learn near-optimal seller strateg...

متن کامل

Multiagent Learning with Bargaining - A Game Theoretic Approach

Learning in the real world occurs when an agent, which perceives its current state and takes actions, interacts with the environment, which in return provides a positive or negative feedback. The field of reinforcement learning studies such processes and attempts to find policies that map states of the world to the actions of agents in order to maximize cumulative reward over the long run. In m...

متن کامل

Nonparametric General Reinforcement Learning

Reinforcement learning problems are often phrased in terms of Markov decision processes (MDPs). In this thesis we go beyond MDPs and consider reinforcement learning in environments that are non-Markovian, non-ergodic and only partially observable. Our focus is not on practical algorithms, but rather on the fundamental underlying problems: How do we balance exploration and exploitation? How do w...

متن کامل

A new consequence of Simpson’s paradox: Stable co-operation in one-shot Prisoner’s Dilemma from populations of individualistic learning agents

Normative theories of individual choice in economics typically assume that interacting agents should each act individualistically: i.e., they should maximize their own utility function. Specifically, game theory proposes that interaction should be governed by Nash equilibria. Computationally limited agents (whether artificial, animal or human) may not, however, have the capacity to carry out th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Syst. E-Business Management

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2005